********************************************************************************
***************************** Replication Code for *****************************
***************** "Competition for Attention in the ETF Space" *****************
********************************************************************************

The package includes three folders: "Codes", "Data", and "Functions"

I. "Codes"

	a) "replication_code.do" generates the main results in the paper

II. "Data"

	a) "etf_month_data.dta" includes the ETF-month level variables: "pseudo_etf_month_data.dta" illustrates the format of the data
	
		1) "permno_etf" is PERMNO of an ETF
		2) "date" is the end-of-month date
		3) "year" is the year of the date
		4) "month" is the month of the date
		5) "t" is an ETF's trading month relative to the launch month 0
		6) "delisting_date" is the date of an ETF's liquidation
		7) "delisted" is an indicator variable of whether an ETF is liquidated as of the end of 2019
		8) "q" is an indicator variable: (=0: broad-based, =1: specialized)
		9) "q4" is an categorical variable: (=1: broad-index, =2: smart-beta, =3: sector/industry, =4: thematic)
		10) "exp_ratio_m" is expense ratio
		11) "ret" is a monthly return
		12) "dlret" is a monthly delisting return
		13) "mktcap" is an ETF's AUM 
		14) "mktcap_lag" is an ETF's AUM in the previous month
		15) "ret_index" is a monthly return of an ETF's underlying index
		16) "rf" is the risk-free rate
		17) "mktrf" is the market return minus the risk-free rate
		18) "smb" is the SMB factor return
		19) "hml" is the HML factor return
		20) "rmw" is the RMW factor return
		21) "cma" is the CMA factor return
		22) "umd" is the UMD factor return
		23) "me" is the ME factor return
		24) "ia" is the I/A factor return
		25) "roe" is the ROE factor return
		26) "users_holdings" is the number of Robinhood owners
		27) "ior" is the 13F ownership
		28) "etf_ret_vw" is the average return of underlying stocks 
		29) "etf_exret_vw" is the average return in excess of the risk-free rate of underlying stocks 
		31) "etf_skew" is the average return skewness of underlying stocks  
		31) "etf_size_vw" is the average size rank of underlying stocks  
		32) "etf_mb_vw" is the average market-to-book ratio of underlying stocks 
		33) "etf_psale_vw" is the average price-to-sales ratio of underlying stocks 
		34) "etf_evebitda_vw" is the average EV-to-EBITDA ratio of underlying stocks 
		35) "etf_sir_vw" is the average short interest ratio of underlying stocks 
		36) "etf_n_vw" is the average number of news articles on underlying stocks 
		37) "etf_css_vw" is the average sentiment scores of news articles of underlying stocks 
		38) "etf_sue_vw" is the average earnings surprises of underlying stocks 
		39) "etf_unprof_vw" the average ratio of unprofitable firms of underlying stocks 
		40) "etf_ltg_vw" the average long-term growth forecasts of underlying stocks 
		41) "etf_fe_vw" the average forecasting errors of underlying stocks 
		42) "etf_users_vw" the average number of Robinhood owners of underlying stocks 
		
	b) "etf_data.dta" includes the ETF-level variables: "pseudo_etf_data.dta" illustrates the format of the data
	
		1) "permno_etf" is PERMNO of an ETF
		2) "launch_date" is the date of the ETF launch
		3) "exp_ratio" is the expense ratio at the time of launch
		4) "turnover_6" is the average share turnover in the first six months since the launch
		5) "delisting_date" is the date of an ETF's liquidation
		6) "delisted" is an indicator variable of whether an ETF is liquidated as of the end of 2019
		7) "us_holdings" is the ratio of US stocks holdings
		8) "sim_all" is the cosine similarity between the ETF portfolio weights and the weights of the aggregate portfolio of all ETFs at the time of launch
		9) "sim_bb" is the cosine similarity between the ETF portfolio weights and the weights of the aggregate portfolio of broad-based ETFs at the time of launch
		10) "sim_sp" is the cosine similarity between the ETF portfolio weights and the weights of the aggregate portfolio of specialized ETFs at the time of launch
		11) "n_holdings" is the number of holdings at the time of launch
		12) "mkt_erxet" is the average market excess returns in the first 60 months since launch
		13) "aum_2002" is an ETF's AUM at the end of 2002
		14) "aum_2019" is an ETF's AUM at the end of 2019
		15) "aum_2019" is an ETF's revenue in 2019
		16) "q" is an indicator variable: (=0: broad-based, =1: specialized)
		17) "q4" is an categorical variable: (=1: broad-index, =2: smart-beta, =3: sector/industry, =4: thematic)
		
	c) "hedging_portfolios.dta" includes the time-series of portfolio returns: "pseudo_hedging_portfolios.dta" illustrates the format of the data
	
		1) "exret_w" is the value-weighted portfolio return in excess of the risk-free rate; portfolios are formed based on exposure to specialized ETFs
		2) "mktrf" is the market return minus the risk-free rate
		3) "smb" is the SMB factor return
		4) "hml" is the HML factor return
		5) "rmw" is the RMW factor return
		6) "cma" is the CMA factor return
		7) "umd" is the UMD factor return
		8) "me" is the ME factor return
		9) "ia" is the I/A factor return
		10) "roe" is the ROE factor return
		
III. "Functions"

	a) "fig2.do" generates Figure 2 of the paper: this code runs in "replication_code.do"
	b) "fig5.do" generates data for Figure 5 of the paper: this code runs in "replication_code.do"